Batching content management operations to facilitate efficient database interactions

ABSTRACT

Batching content management operations to facilitate efficient database interactions is disclosed. Two or more content management-related requests are received. The two or more content management-related requests are treated as a batch, including by formulating and sending to a database in a single database interaction a grouped request to add, delete, or modify each of a plurality of database records.

CROSS REFERENCE TO OTHER APPLICATIONS

This application is a continuation of co-pending U.S. patent applicationSer. No. 13/603,065, entitled BATCHING CONTENT MANAGEMENT OPERATIONS TOFACILITATE EFFICIENT DATABASE INTERACTIONS filed Sep. 4, 2012 which isincorporated herein by reference for all purposes, which is acontinuation of U.S. patent application Ser. No. 12/005,061, now U.S.Pat. No. 8,280,917, entitled BATCHING CONTENT MANAGEMENT OPERATIONS TOFACILITATE EFFICIENT DATABASE INTERACTIONS filed Dec. 21, 2007 which isincorporated herein by reference for all purposes.

BACKGROUND OF THE INVENTION

Content management solutions facilitate the creation, storage,retrieval, promotion (e.g., through a review/approval and/or otherbusiness process or work flow), retention, migration, and/or destructionof content, typically in the context of a relatively large body ofcontent. A wide variety of regulatory and other legal and/or businessrequirements prescribe a manner and/or duration of retention of certaincontent. In some environments, large volumes of similar content objects,e.g., email messages or other communications, ecommerce or othertransaction records, stock quotes, etc. must be ingested relativelyquickly into a content management system. A content management systemtypically uses a database, such as a relational database managementsystem (RDBMS), to store metadata associated with content items (e.g.,documents or other files or objects) under management of the contentmanagement system. In a typical content management system, for each suchcontent item that is added to a body of content being managed by thecontent management system one or more objects must be created and/orassociated data stored (or updated) in a database, which typicallyresults in one or more database interactions being performed for eachcontent item that is ingested. Other common and/or repetitiveinteractions by a client and/or application with a typical contentmanagement system similarly can result in inefficient interactions withthe database. In a typical content management system, some efficiencymay be attained by associating related operations together into a singledatabase “transaction”, but even then some inefficiency remains, e.g.,the RDBMS typically inserts (or updates) each row individually,resulting in more network transfers and processing overhead.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments of the invention are disclosed in the followingdetailed description and the accompanying drawings.

FIG. 1 is a block diagram illustrating an embodiment of a contentmanagement system.

FIG. 2 is a flow chart illustrating an embodiment of a prior art processfor performing content management commands and/or operations.

FIG. 3 is a flow chart illustrating an embodiment of a process forbatching content management commands and/or operations.

FIG. 4 is a flow chart illustrating an embodiment of a process forperforming batched content management operations.

FIG. 5 is a flow chart illustrating an embodiment of a process fordetermining whether an intra-batch flush criterion has been met.

FIG. 6 is a flow chart illustrating an embodiment of a process forflushing a queue of batched content management-related commands and/oroperations.

FIG. 7 is a block diagram illustrating an embodiment of a contentmanagement system configured to batch content management-relatedcommands and/or operations.

DETAILED DESCRIPTION

The invention can be implemented in numerous ways, including as aprocess, an apparatus, a system, a composition of matter, a computerreadable medium such as a computer readable storage medium or a computernetwork wherein program instructions are sent over optical orcommunication links. In this specification, these implementations, orany other form that the invention may take, may be referred to astechniques. A component such as a processor or a memory described asbeing configured to perform a task includes both a general componentthat is temporarily configured to perform the task at a given time or aspecific component that is manufactured to perform the task. In general,the order of the steps of disclosed processes may be altered within thescope of the invention. As used herein, the term ‘processor’ refers toone or more devices, circuits, and/or processing cores configured toprocess data, such as computer program instructions.

A detailed description of one or more embodiments of the invention isprovided below along with accompanying figures that illustrate theprinciples of the invention. The invention is described in connectionwith such embodiments, but the invention is not limited to anyembodiment. The scope of the invention is limited only by the claims andthe invention encompasses numerous alternatives, modifications andequivalents. Numerous specific details are set forth in the followingdescription in order to provide a thorough understanding of theinvention. These details are provided for the purpose of example and theinvention may be practiced according to the claims without some or allof these specific details. For the purpose of clarity, technicalmaterial that is known in the technical fields related to the inventionhas not been described in detail so that the invention is notunnecessarily obscured.

Batching content management operations to facilitate efficient databaseinteractions is disclosed. In some embodiments, a group of relatedcontent management commands or operations are processed by a contentmanagement system as a batch. For example, rather than performing eachrequested operation in series, and interacting separately with adatabase for each content management operation that requires databaseinteraction, related (also referred to herein as “batched”) operationsand/or associated database interactions are performed collectively. Forexample, in some embodiments a series of successive content managementoperations, such as creating and storing for each of a series of contentitems being ingested into a body of managed content a correspondingobject to represent the content item (e.g., in a body of metadata) aretreated as a group or “batch”. For example, under prior approaches datacomprising each object may be sent to the database in an associated“save” or other database command or interaction. In at least some priorsystems, some efficiency may be achieved by grouping multiple such saveoperations into a single database transaction, which lowers databaseoverhead by allowing the database to wait until all the databasecommands associated with the transaction have been received before“committing” changes to the database. However, even if multiplesuccessive and/or repetitive commands are included in a single databasetransaction, the RDBMS typically inserts (or updates) each rowindividually, resulting in more network transfers and processingoverhead. Using the batching technique disclosed herein, in someembodiments multiple updates are gathered by the content managementsystem and handed to the database in a single database interaction,which enables the database to update multiple rows in a singleoperation, further reducing overhead.

In some embodiments, an API or other interface is provided that enablesan application or other process to “batch” content management commands,for example by providing an explicit indication to “begin batch” and“end batch”. In some embodiments, an indication to begin/end a batch isimplicit in other actions or events, e.g., commands indicated as beingpart of a single database transaction in some embodiments may also betreated as comprising a batch of content management commands.

In some embodiments, if content management commands and/or operationsare included in a batch a requesting application, process, user, orother entity must expect that associated data may not be reflected inthe database until the entire batch has been processed. In someembodiments, the content management system may, but will notnecessarily, update the database prior to receiving an indication thatan end of a batch has been reached. For example, other criteria(insufficient memory, interdependencies, caching policies, etc.) mayresult in content management commands being performed and/or reflectedin the database prior to the end of a batch being indicated and/orreached.

FIG. 1 is a block diagram illustrating an embodiment of a contentmanagement system. One more clients 102 connect via a network 104 to acontent server 106 configured to manage and provide access to body ofcontent stored in a content store 108. For each content item in contentstore 108, corresponding metadata is stored in a metadata store 110. Insome embodiments, each content item in content store 108 is representedin metadata stored in metadata store 110 by one or more objectsconfigured to provide content management related functionality withrespect to the content item. Data comprising each metadata object isstored in some embodiments in one or more database table, e.g., in arelational database management system (RDBMS). In some embodiments acontent management client on client 102 communicates with content server106 via network 104 to make content management related servicesavailable to applications running on client 102. In some embodiments,client applications on client 102 use a content management frameworkassociated with the content management client to access contentmanagement related services. For example, such a client application maybe configured to store a new content item (e.g., a document or otherobject) by invoking the content management client (or the server 106directly) to create and save a new object, e.g., a new object configuredand/or usable to save in content store 108 content data comprising thecontent item and/or to represent the content item in metadata 110.Similarly, to retrieve data, a client application and/or contentmanagement client would communicate with content server 106 via network104. The client 102 may provide, for example, an identifier associatedwith a particular content item and/or one or more search criteria, suchas a query to search for items created by a particular author on aspecified date. In the case of retrieval of a specified content item,the content server 106 would use the provided identifier to retrieve thecontent item from the content store 108 and provide it to the client 102via network 104. In the case of a query, the content server 106 wouldquery database 110 to identify responsive objects. In some embodiments,metadata associated with responsive objects is sent via network 104 toclient 102 where a user and/or process may select one or more responsiveobjects for retrieval.

In some environments, a very large number of similar content managementoperations may be required to be performed in a very short period oftime. For example, emails or other messages may be required to bearchived as they are generated, sent, and/or received, e.g., in a largeenterprise environment. Or, a preexisting body of content may berequired to be imported into a content management system. In such cases,a client application or other process on client 102 typically would beconfigured to process content items serially, potentially invoking overand over again the same content management system commands and/oroperations.

FIG. 2 is a flow chart illustrating an embodiment of a prior art processfor performing content management commands and/or operations. In theexample shown, each content management-related operation is performed asit is received (202 and 204) until all requested operations have beenperformed (206).

Batching content-management related commands and/or operations isdisclosed. In some embodiments, batched commands and/or operations willnot necessarily (but may) be reflected in an applicable database, e.g.,content store 108 and/or metadata store 110, until an end of the batchis reached and/or indicated. In some embodiments, an indication isprovided by a requesting entity that one or more contentmanagement-related operations and/or commands may be treated as arelated batch of operations. In some embodiments, commands and/oroperations may be batched based on some criterion other than an explicitindication from a requesting entity, e.g., to coincide with databasetransaction boundaries (as indicated by a requesting entity orotherwise) and/or based on other events and/or indications. In someembodiments, a requesting entity (e.g., a client application)understands that changes associated with batched commands and/oroperations may not be reflected in an applicable database until an endof the batch is reached and/or indicated. For example, to improveefficiency in interacting with a database such as metadata store 110 thecontent management system (e.g., content server 106) may queue commandsand/or operations to be able to optimize database interactions bytreating the batched operations as a group. For example, instead ofsending to the database a thousand successive requests, each requestrequesting insertion of a new row corresponding to a new object, thecontent server may in a single interaction with the database requestinsertion of all one thousand rows.

FIG. 3 is a flow chart illustrating an embodiment of a process forbatching content management commands and/or operations. In the exampleshown, a request to perform a content management-related operation(e.g., a command) is received (302). If the request is part of a batch(304) it is place in a queue, for example with one or more otherrequests in the batch. Otherwise, it is performed (308). Subsequentiterations of steps 302-308, as applicable, are repeated as/ifsubsequent requests are received (310).

In some embodiments, the beginning and ending of a batch is or may beindicated explicitly by a requesting entity, such as a clientapplication. For example, a client application desiring to have aplurality of objects 1 to n created and save in succession, in someembodiments the client application would indicate to the contentmanagement system that the operations may and/or should be treated as abatch using syntax such as the following:

begin batch create object 1 save object 1 create object 2 save object 2. . . create object n save object n end batch

FIG. 4 is a flow chart illustrating an embodiment of a process forperforming batched content management operations. In the example shown,batched operations are performed, e.g., as a group, upon receipt of anindication that an intra-batch “flush” criterion has been met (402 and404) and/or upon receipt of an indication that an end of the batch hasbeen indicated and/or reached (406 and 408). Examples of an intra-batchflush criterion include an indication that a query or other requestimplicating one or more objects that are and/or may be affected by acommand in the batch queue, other dependencies on objects in the queue,and approaching a state in which memory available to queue batchedcommands is full. The example shown in FIG. 4 illustrates that in someembodiments previously received batched operations may be performedprior to an end of the batch being indicated and/or received. In suchembodiments, a requesting entity that sent the batched requests andindicating that they should be treated as a batch must be configured toassume that the operations will not necessarily, but may (e.g., if thedetermination in 402 is affirmative), be reflected in an applicabledatabase.

FIG. 5 is a flow chart illustrating an embodiment of a process fordetermining whether an intra-batch flush criterion has been met. In someembodiments, the process of FIG. 5 is used to implement 402 of FIG. 4.Upon receiving a query implicating an object affected by a batchedcommand and/or operation that has not yet been performed (502); anothercommand that depends on an objected affected by such a command and/oroperation (504), for example from a process or other entity other thanone that requested the batched command and/or operation; or anindication that a memory used to queue batched commands and/or requestsis full (506), it is concluded that an intra-batch flush criterion hasbeen met (508). The process continues until an end of the batch isreached (510).

FIG. 6 is a flow chart illustrating an embodiment of a process forflushing a queue of batched content management-related commands and/oroperations. In some embodiments, the process of FIG. 6 is used toimplement 404 and 408 of FIG. 4. To flush the queue, pending commandsare evaluated and grouped for efficient database interaction (602). Forexample, a set of paired commands to “create” and “save” objectsassociated with content items being added to a body of managed contentmay be identified to be performed at the content server andcorresponding database requests, for example to insert for each newobject a corresponding row in an associated database table, sent to thedatabase as a group. Associated database commands and/or requests areformulated, queued, and sent to the database (604) in accordance withthe determination at 602.

FIG. 7 is a block diagram illustrating an embodiment of a contentmanagement system configured to batch content management-relatedcommands and/or operations. In the example shown, content server 106includes a network interface 702 used to communicate with clients suchas client 102 via a network such as network 104. The content server 106also includes an object manager 704 configured to manage objectscreated, accessed, and/or otherwise used by requesting clients and/orinternal processes. The object manager 704 communicates with clients viathe network interface 702, e.g., to receive content data, contentmanagement-related commands, etc. and conversely to send requested dataand/or information responsive to requests, such as query results. Theobject manager 704 is configured to interact via a database interface706 with one or more relational database management systems (RDBMS),e.g., to add or modify data to/in a content store such as content store108 and/or a metadata store such as metadata store 110. In variousembodiments, one or more of the object manager 704 and/or the databaseinterface 706 is/are configured to support batching of contentmanagement-related commands and/or operations as disclosed herein, forexample by implementing all or part of one or more of the processes ofFIGS. 3-6.

Although the foregoing embodiments have been described in some detailfor purposes of clarity of understanding, the invention is not limitedto the details provided. There are many alternative ways of implementingthe invention. The disclosed embodiments are illustrative and notrestrictive.

What is claimed is:
 1. A method for managing stored content, comprising:receiving a plurality of content management-related requests originallyassigned as part of a batch, wherein the batch of contentmanagement-related requests comprises requests that are originallybatched together to be sent to a database before a change to thedatabase corresponding to the requests is committed; and using aprocessor to determine that a first request among the plurality ofreceived content management-related requests has a dependency upon atleast one of the plurality of content management-related requests thatis received prior to the first request, and in response, processing allof the content management-related requests that are received prior tothe first request together as the batch, comprising: sending all of thecontent management-related requests that are received prior to the firstrequest to the database in a single database interaction; and delayingto perform the first request and any content management-related requeststhat are received after the first request.
 2. The method of claim 1,further comprising receiving from a requesting entity with which theplurality of content management-related requests is associated anindication that the plurality of content management-related requests isassociated.
 3. The method of claim 2, wherein the indication comprisesan explicit indication that the plurality of content management-relatedrequests comprises a batch.
 4. The method of claim 1, wherein processingall of the content management-related requests that are received priorto the first request together as the batch is performed in the event itis determined to flush the received content management-related requestseven if an end of the batch has not been reached or indicated.
 5. Themethod of claim 1, further comprising: using a processor to determinewhether a query is received, wherein the query implicates an object thatis affected by an as yet unperformed received request; and in responseto receiving the query, processing only the content management-relatedrequests that are received prior to the first request together as thebatch.
 6. The method of claim 1, further comprising: storing theplurality of content management-related requests; using a processor todetermine whether an indication that a memory being used to store thecontent management-related requests is full or nearly full; and inresponse to receiving the indication, processing only a portion of theplurality of content management-related requests together as the batch.7. A content management system, comprising: a communication interfaceconfigured to receive a plurality of content management-related requestsoriginally assigned as part of a batch, wherein the batch of contentmanagement-related requests comprises requests that are originallybatched together to be sent to a database before a change to thedatabase corresponding to the requests is committed; and a processorconfigured to: determine that a first request among the plurality ofreceived content management-related requests has a dependency upon atleast one of the plurality of content management-related requests thatis received prior to the first request, and in response, processing allof the content management-related requests that are received prior tothe first request together as the batch, comprising: sending all of thecontent management-related requests that are received prior to the firstrequest to the database in a single database interaction; and delayingto perform the first request and any content management-related requeststhat are received after the first request.
 8. The content managementsystem of claim 7, wherein the processor is further configured toreceive from a requesting entity with which the plurality of contentmanagement-related requests is associated an indication that theplurality of content management-related requests is associated.
 9. Thecontent management system of claim 8, wherein the indication comprisesan explicit indication that the plurality of content management-relatedrequests comprises a batch.
 10. The content management system of claim7, wherein processing the content management-related requests that arereceived prior to the first request together as a batch is performed inthe event it is determined to flush the received contentmanagement-related requests even if an end of the batch has not beenreached or indicated.
 11. The content management system of claim 7,wherein the processor is further configured to: determine whether aquery is received, wherein the query implicates an object that isaffected by an as yet unperformed received request; and in response toreceiving the query, process only the content management-relatedrequests that are received prior to the first request together as thebatch.
 12. The content management system of claim 7, wherein theprocessor is further configured to: store the plurality of contentmanagement-related requests; determine whether an indication that amemory being used to store the content management-related requests isfull or nearly full; and in response to receiving the indication,process only a portion of the plurality of content management-relatedrequests together as the batch.
 13. A computer program product formanaging stored content, the computer program product being embodied ina non-transitory computer readable storage medium and comprisingcomputer instructions for: receiving a plurality of contentmanagement-related requests originally assigned as part of a batch,wherein the batch of content management-related requests comprisesrequests that are originally batched together to be sent to a databasebefore a change to the database corresponding to the requests iscommitted; and determine that a first request among the plurality ofreceived content management-related requests has a dependency upon atleast one of the plurality of content management-related requests thatis received prior to the first request, and in response, processing allof the content management-related requests that are received prior tothe first request together as the batch, comprising: sending all of thecontent management-related requests that are received prior to the firstrequest to the database in a single database interaction; and delayingto perform the first request and any content management-related requeststhat are received after the first request
 14. The computer programproduct recited in claim 13, further comprising computer instructionsfor receiving from a requesting entity with which the plurality ofcontent management-related requests is associated an indication that theplurality of content management-related requests is associated.
 15. Thecomputer program product recited in claim 14, wherein the indicationcomprises an explicit indication that the plurality of contentmanagement-related requests comprises a batch.
 16. The computer programproduct of claim 13, wherein processing all of the contentmanagement-related requests that are received prior to the first requesttogether as a batch is performed in the event it is determined to flushthe received content management-related requests even if an end of thebatch has not been reached or indicated.
 17. The computer programproduct of claim 13, further comprising computer instructions for:determining whether a query is received, wherein the query implicates anobject that is affected by an as yet unperformed received request; andin response to receiving the query, processing only the contentmanagement-related requests that are received prior to the first requesttogether as the batch.
 18. The computer program product of claim 13,further comprising computer instructions for: storing the plurality ofcontent management-related requests; determining whether an indicationthat a memory being used to store the content management-relatedrequests is full or nearly full; and in response to receiving theindication, processing only a portion of the plurality of contentmanagement-related requests together as the batch.